43 research outputs found

    The Impact of RDMA on Agreement

    Full text link
    Remote Direct Memory Access (RDMA) is becoming widely available in data centers. This technology allows a process to directly read and write the memory of a remote host, with a mechanism to control access permissions. In this paper, we study the fundamental power of these capabilities. We consider the well-known problem of achieving consensus despite failures, and find that RDMA can improve the inherent trade-off in distributed computing between failure resilience and performance. Specifically, we show that RDMA allows algorithms that simultaneously achieve high resilience and high performance, while traditional algorithms had to choose one or another. With Byzantine failures, we give an algorithm that only requires n≥2fP+1n \geq 2f_P + 1 processes (where fPf_P is the maximum number of faulty processes) and decides in two (network) delays in common executions. With crash failures, we give an algorithm that only requires n≥fP+1n \geq f_P + 1 processes and also decides in two delays. Both algorithms tolerate a minority of memory failures inherent to RDMA, and they provide safety in asynchronous systems and liveness with standard additional assumptions.Comment: Full version of PODC'19 paper, strengthened broadcast algorith

    The FIDS Theorems: Tensions between Multinode and Multicore Performance in Transactional Systems

    Full text link
    Traditionally, distributed and parallel transactional systems have been studied in isolation, as they targeted different applications and experienced different bottlenecks. However, modern high-bandwidth networks have made the study of systems that are both distributed (i.e., employ multiple nodes) and parallel (i.e., employ multiple cores per node) necessary to truly make use of the available hardware. In this paper, we study the performance of these combined systems and show that there are inherent tradeoffs between a system's ability to have fast and robust distributed communication and its ability to scale to multiple cores. More precisely, we formalize the notions of a \emph{fast deciding} path of communication to commit transactions quickly in good executions, and \emph{seamless fault tolerance} that allows systems to remain robust to server failures. We then show that there is an inherent tension between these two natural distributed properties and well-known multicore scalability properties in transactional systems. Finally, we show positive results; it is possible to construct a parallel distributed transactional system if any one of the properties we study is removed

    Brief Announcement: Survey of Persistent Memory Correctness Conditions

    Get PDF
    In this brief paper, we survey existing correctness definitions for concurrent persistent programs

    Implicit Decomposition for Write-Efficient Connectivity Algorithms

    Full text link
    The future of main memory appears to lie in the direction of new technologies that provide strong capacity-to-performance ratios, but have write operations that are much more expensive than reads in terms of latency, bandwidth, and energy. Motivated by this trend, we propose sequential and parallel algorithms to solve graph connectivity problems using significantly fewer writes than conventional algorithms. Our primary algorithmic tool is the construction of an o(n)o(n)-sized "implicit decomposition" of a bounded-degree graph GG on nn nodes, which combined with read-only access to GG enables fast answers to connectivity and biconnectivity queries on GG. The construction breaks the linear-write "barrier", resulting in costs that are asymptotically lower than conventional algorithms while adding only a modest cost to querying time. For general non-sparse graphs on mm edges, we also provide the first o(m)o(m) writes and O(m)O(m) operations parallel algorithms for connectivity and biconnectivity. These algorithms provide insight into how applications can efficiently process computations on large graphs in systems with read-write asymmetry

    Contention in Structured Concurrency: Provably Efficient Dynamic Non-Zero Indicators for Nested Parallelism

    Get PDF
    International audienceOver the past two decades, many concurrent data structures have been designed and implemented. Nearly all such work analyzes concurrent data structures empirically, omitting asymptotic bounds on their efficiency, partly because of the complexity of the analysis needed, and partly because of the difficulty of obtaining relevant asymptotic bounds: when the analysis takes into account important practical factors, such as contention, it is difficult or even impossible to prove desirable bounds. In this paper, we show that considering structured concurrency or relaxed concurrency models can enable establishing strong bounds, also for contention. To this end, we first present a dynamic relaxed counter data structure that indicates the non-zero status of the counter. Our data structure extends a recently proposed data structure, called SNZI, allowing our structure to grow dynamically in response to the increasing degree of concurrency in the system. Using the dynamic SNZI data structure, we then present a concurrent data structure for series-parallel directed acyclic graphs (sp-dags), a key data structure widely used in the implementation of modern parallel programming languages. The key component of sp-dags is an in-counter data structure that is an instance of our dynamic SNZI. We analyze the efficiency of our concurrent sp-dags and in-counter data structures under nested-parallel computing paradigm. This paradigm offers a structured model for concurrency. Under this model, we prove that our data structures require amortized O(1) shared memory steps, including contention. We present an implementation and an experimental evaluation that suggests that the sp-dags data structure is practical and can perform well in practice

    Efficient and Adaptively Secure Asynchronous Binary Agreement via Binding Crusader Agreement

    Get PDF
    We present a new abstraction based on crusader agreement called Binding Crusader Agreement\textit{Binding Crusader Agreement} (BCA) for solving binary consensus in the asynchronous\textit{asynchronous} setting against an adaptive\textit{adaptive} adversary. BCA has the validity, agreement, and termination properties of crusader agreement in addition to a new property called binding\textit{binding}. Binding states that before the first non-faulty party terminates, there is a value v∈{0,1}v \in \{0,1\} such that no non-faulty party can output the value vv in any continuation of the execution. We believe that reasoning about binding explicitly, as a first order goal, greatly helps algorithm design, clarity, and analysis. Using our framework, we solve several versions of asynchronous binary agreement against an adaptive adversary in a simple and modular manner that either improves or matches the efficiency of state of the art solutions. We do this via new BCA protocols, given a strong common coin, and via new Graded BCA protocols given an ϵ\epsilon-good common coin. For crash failures, we reduce the expected time to terminate and we provide termination bounds that are linear in the goodness of the common coin. For Byzantine failures, we improve the expected time to terminate in the computational setting with threshold signatures, and match the state of the art in the information theoretic setting, both with a strong common coin and with an ϵ\epsilon-good common coin

    Frugal Byzantine Computing

    Get PDF
    Traditional techniques for handling Byzantine failures are expensive: digital signatures are too costly, while using 3f+1 replicas is uneconomical (f denotes the maximum number of Byzantine processes). We seek algorithms that reduce the number of replicas to 2f+1 and minimize the number of signatures. While the first goal can be achieved in the message-and-memory model, accomplishing the second goal simultaneously is challenging. We first address this challenge for the problem of broadcasting messages reliably. We study two variants of this problem, Consistent Broadcast and Reliable Broadcast, typically considered very close. Perhaps surprisingly, we establish a separation between them in terms of signatures required. In particular, we show that Consistent Broadcast requires at least 1 signature in some execution, while Reliable Broadcast requires O(n) signatures in some execution. We present matching upper bounds for both primitives within constant factors. We then turn to the problem of consensus and argue that this separation matters for solving consensus with Byzantine failures: we present a practical consensus algorithm that uses Consistent Broadcast as its main communication primitive. This algorithm works for n = 2f+1 and avoids signatures in the common case - properties that have not been simultaneously achieved previously. Overall, our work approaches Byzantine computing in a frugal manner and motivates the use of Consistent Broadcast - rather than Reliable Broadcast - as a key primitive for reaching agreement

    On the Round Complexity of Asynchronous Crusader Agreement

    Get PDF
    We present new lower and upper bounds on the number of communication rounds required for asynchronous Crusader Agreement (CA) and Binding Crusader Agreement (BCA), two primitives that are used for solving binary consensus. We show results for the information theoretic and authenticated settings. In doing so, we present a generic model for proving round complexity lower bounds in the asynchronous setting. In some settings, our attempts to prove lower bounds on round complexity fail. Instead, we show new, tight, rather surprising round complexity upper bounds for Byzantine fault tolerant BCA with and without a PKI setup

    Multiversion Concurrency with Bounded Delay and Precise Garbage Collection

    Full text link
    In this paper we are interested in bounding the number of instructions taken to process transactions. The main result is a multiversion transactional system that supports constant delay (extra instructions beyond running in isolation) for all read-only transactions, delay equal to the number of processes for writing transactions that are not concurrent with other writers, and lock-freedom for concurrent writers. The system supports precise garbage collection in that versions are identified for collection as soon as the last transaction releases them. As far as we know these are first results that bound delays for multiple readers and even a single writer. The approach is particularly useful in situations where read-transactions dominate write transactions, or where write transactions come in as streams or batches and can be processed by a single writer (possibly in parallel). The approach is based on using functional data structures to support multiple versions, and an efficient solution to the Version Maintenance (VM) problem for acquiring, updating and releasing versions. Our solution to the VM problem is precise, safe and wait-free (PSWF). We experimentally validate our approach by applying it to balanced tree data structures for maintaining ordered maps. We test the transactional system using multiple algorithms for the VM problem, including our PSWF VM algorithm, and implementations with weaker guarantees based on epochs, hazard pointers, and read-copy-update. To evaluate the functional data structure for concurrency and multi-versioning, we implement batched updates for functional tree structures and compare the performance with state-of-the-art concurrent data structures for balanced trees. The experiments indicate our approach works well in practice over a broad set of criteria
    corecore